Scientific Python antipatterns advent calendar day twelve

For today, something that is a nice illustration of the difference between computers and humans! As a reminder, I’ll post one tiny example per day with the intention that they should only take a couple of minutes to read.

If you want to read them all but can’t be bothered checking this website each day, sign up for the mailing list:

Sign up for the mailing list

and I’ll send a single email at the end with links to them all.

Expressing conditions in an awkward way

One of the things that I always tell novice programmers is that there are generally multiple different ways of expressing a test or condition, and the first one we think of might not be the best. Let’s give ourselves a list of fruits:

fruits = [
    "apple",
    "banana",
    "orange",
    "elderberry",
    "grape",
    "pineapple",
    "apricot",
    "kiwi",
    "blueberry",
    "olive", # technically a fruit!
]

and we will try to find the ones that start with the letter ‘a’. Pretty straightforward:

for fruit in fruits:
    if fruit.startswith('a'):
        print(fruit)
apple
apricot

Now let’s try to add the ones starting with ‘o’. The obvious choice here is to use or:

for fruit in fruits:
    if fruit.startswith('a') or fruit.startswith('o'):
        print(fruit)
apple
orange
apricot
olive

Not too bad. Now let’s add in the rest of the vowels:

for fruit in fruits:
    if (   fruit.startswith('a') 
        or fruit.startswith('o') 
        or fruit.startswith('e') 
        or fruit.startswith('i')
        or fruit.startswith('u')
       ):
        print(fruit)
apple
orange
elderberry
apricot
olive

Now the problem becomes quite apparent; every time we want to add another posibility, we need to copy the whole fruit.startswith method call and so the condition becomes very complicated to read. We have even had to surround the whole thing with brackets so that we can spread it out over multiple lines.

As well as being hard to read, this version of the condition is also a bit tricky to modify. If we want to add or remove letters, we have to carefully duplicate or delete just the right bits of code.

How could we have expressed our original condition differently? An exact equivalent is

for fruit in fruits:
    if fruit[0] == 'a':
        print(fruit)
apple
apricot

Hopefully it’s clear that this is logically the same as the startswith version. We can extend it with or just like before:

for fruit in fruits:
    if fruit[0] == 'a' or fruit[0] == 'o':
        print(fruit)
apple
orange
apricot
olive

but we could also use in and switch to a list:

for fruit in fruits:
    if fruit[0] in ['a', 'o']:
        print(fruit)
apple
orange
apricot
olive

This version of the condition has the very nice property that it’s easy to add or remove letters - we just edit the list. We can add all the vowels without making the if line too complicated:

for fruit in fruits:
    if fruit[0] in ['a', 'o', 'e', 'i', 'u']:
        print(fruit)
apple
orange
elderberry
apricot
olive

We can make it even more readable by assigining this list to a variable:

vowels = ['a', 'o', 'e', 'i', 'u']

for fruit in fruits:
    if fruit[0] in vowels:
        print(fruit)
apple
orange
elderberry
apricot
olive

making it very clear what is special about this group of letters. Since in works on any iterable type, we don’t even need a list - this will work fine with a string:

vowels = 'aeiou'

for fruit in fruits:
    if fruit[0] in vowels:
        print(fruit)
apple
orange
elderberry
apricot
olive

This is an interesting trade off - for a single letters, startswith is probably more readable, but for many letters, in works better.

Here’s another similar example. Let’s classify our fruit names as short, medium or long. We will say that the names are short if they have fewer than six characters, medium if they have between six and eight, and long if they have more than eight characters. This is a condition with multiple branches so requires if/elif:

for fruit in fruits:
    if len(fruit) < 6:
        print(fruit, 'short')
    elif len(fruit) >=6 and len(fruit) <=8:
        print(fruit, 'medium')
    elif len(fruit) > 8:
        print(fruit, 'long')
apple short
banana medium
orange medium
elderberry long
grape short
pineapple long
apricot medium
kiwi short
blueberry long
olive short

This version follows the same logic as the description, and gives the correct classifications. But it involves a lot of values and comparisons! We have four numbers, and four different comparison operators. With this structure, it’s very easy to make mistakes like changing the short threshold in one place but forgetting to change the other one, which makes some of the fruits simply disappear from the output:

for fruit in fruits:
    if len(fruit) < 6:
        print(fruit, 'short')
    elif len(fruit) >=7 and len(fruit) <=8:
        print(fruit, 'medium')
    elif len(fruit) > 8:
        print(fruit, 'long')
apple short
elderberry long
grape short
pineapple long
apricot medium
kiwi short
blueberry long
olive short

Another common error here is to forget that one of the comparisons must include the equal-to bit - this version simply misses any fruit names that are exactly six or eight characters long:

for fruit in fruits:
    if len(fruit) < 6:
        print(fruit, 'short')
    elif len(fruit) > 6 and len(fruit) < 8:
        print(fruit, 'medium')
    elif len(fruit) > 8:
        print(fruit, 'long')
apple short
elderberry long
grape short
pineapple long
apricot medium
kiwi short
blueberry long
olive short

Both of these errors are quite hard to spot from just looking at the code.

But consider this version of the condition:

for fruit in fruits:
    if len(fruit) < 6:
        print(fruit, 'short')
    elif len(fruit) > 8:
        print(fruit, 'long')
    else:
        print(fruit, 'medium')
apple short
banana medium
orange medium
elderberry long
grape short
pineapple long
apricot medium
kiwi short
blueberry long
olive short

This version is exactly logically equivalent to the earlier one, but because we have rearranged the order of the conditions, we can make it much simpler. Now each number is only present once - which will make it much easier to change - and we only have to comparison operators to worry about.

One more time; if you want to see the rest of these little write-ups, sign up for the mailing list:

Sign up for the mailing list